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ABSTRACT 



Pattern recognition schemes have been concerned, mainly, 
with the problem of identifying alphabetic characters, or 
numerals. To this end adaptive pattern recognition devices have 

i 

been trained to recognize such characters. The topic discussed 

l 

here is the training of an adaptive pattern recognition device, 
Adaline, to mimic the performance of the controller of a plant. 

This study discusses Adaline and the minimum square error 
method of adaption. Adaline is trained by observing the behaviour 
of the controller in numerous situations, ^he problem of 
coding the information to be presented to Adaline is discussed* 
and finally, a suitably trained Adaline takes control of the 
plant . 
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INTRODUCTION 



The problem of automatic character recognition has received 
a great deal of attention recently. Many schemes have been 
proposed and they can be divided into two main groups, "open 
loop" types and "closed loop" types. Open loop schemes compare 
the character to be recognized with a library stock and make 
a decision on this basis. These schemes are very useful when 
the characters are, essentially, standard — possibly typewritten. 
Closed loop schemes are trained to recognize characters by 
applying as many patterns as possible to the machine while 
"teaching" it to generate the correct answers before asking it 
to operate alone. These schemes have been described by such 
names as "linear decision network" or "adaptive linear neuron" 
(Adaline). Their advantage over closed loop schemes is that 
they are able to attempt the classification of patterns other 
than those on which the machine has been trained. 

The latter scheme has application to process control. 

The human operator of a steel rolling mill is presented with 
such data as speed and temperature of the incoming billet. 

On the basis of this knowledge and past experience he can 
adjust the rolls to produce the desired sheet steel. A pattern 
recognition device could be placed beside him and be trained 
to recognize suitably coded patterns of information about 
the billet and thus to produce the same "response" as 
the operator in control of the rolls. 
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The present investigation considers a simple relay activated 
plant, which is already controllable, and considers the training 
of a pattern recognition device to produce the same result as 
the relay controller. The blocK diagram of a stable plant 
is shown in Figure (a), where d(£) is the control signal which 
causes the relay to apply driving power to the plant. If the 
linear control system error is defined to be the difference between 
desired and actual output at any given time, then a convenient way 
of describing the behaviour of a plant is to consider the 
system error and its derivatives as functions of time. 




Figure (a) 

The pattern recognizer is, therefore, presented with suitably 
coded patterns of error, error rate (and higher derivatives 
if necessary) and it is then trained to produce an output 



signal, CX(.^ , close to di%) . This is indicated in Figure (b). 
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Training involves presenting the pattern recognizer with as many 



typical operating conditions as possible, while adjusting it 
so that o.(t) matches as closely as possible over the 

entire range of conditions. After the training period the 
compensator can be disconnected as the learning machine takes 
its place. This is shown in Figure (c). 




Figure (c) 

The present study discusses the properties of the pattern 
recognition device (in Chapter I) and a specific method of 
adaption (in Chapter II.) The training of the device, the doding 
of error and error rate and the control of the plant by the 
learning machine are discussed in Chapter III. The conclusions 
appear in Chapter IV. 
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CHAPTER I 



PATTERN RECOGNITION 



An adaptive pattern recognition device of the type to 
be considered has four essential components: a sensory unit, 

i 

an association unit, a response unit and an adjustment unit. 

The whole device is shown in Figure 1. It is expected of 
such a device that It can be trained to recognize stimuli or 
patterns which are part of the environment in which it is placed. 




Figure 1 



The sensory unit is a transducer which produces, 
possibly, a set of electrical signals in response to a visual 
or audio pattern. If the patterns which are to be recognized 
are alphabetical or numerical, then they could be displayed 
on a matrix of photo-cells which could, in turn, generate 
positive or negative voltages, depending on the presence 

or absence of an element of the pattern. This is shown in 

+■1 
+i 

41 

1 

1 

-1 

-I 

2 
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Throughout the remainder of the discussion the characteristics 
of the transducer will be bypassed and the pattern, or Input, 
for the association unit will be considered to be simply 
an array of positive and negative voltages of unit magnitude „ 

The association unit is a logical decision element 
which produces an output on receipt of the input pattern. 

It will be assumed to consist of a set of adjustable weights 
of number \ when the number of elements of the Input pattern 
is . The ft elements of the input pattern are supplied 

to the weights V s /, \M n <, The weight, W 0 , is called 

the threshold and its input is fixed at +1« The values of 
the weights and threshold are determined by previous training. 

In this study all the weights and threshold will have any 
past experience removed by setting them to zero before 
a training sequence begins. If the elements of a pattern 
are supplied to the device, the sum of the outputs of the 
weights and the threshold is called the analogue output , & . 
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The function of the response unit is to indicate which 
decision has been made on the pattern applied to the sensor. 
In this discussion the characteristics of the response unit 
will be bypassed and the analogue output, Q, , ^rill be used 
as an indication of the pattern which has been applied. 

These ideas have been discussed by many authors, £ 1 , 2, ? 
3, 4j 5)o Widrow has named the device "Adaline rt (Adaptive 
linear neuron). 

1*1 The. Adaptive Mechanism 

The problem of pattern recognition using Adaline 
could be stated in the following way:- given m input 
patterns, each having H pattern elements, separate them 
into two classes — some of the patterns being mapped to 
positive values of analogue output and the remainder 
to negative values of analogue output. 

Each input pattern can be thought of as a column vector, 



M: 



where 



01 = 



with = ±1, and 



X 



*1 



l 

Xi,n 

the/tth pattern of HI 



If the A th 



pattern, 01 > is applied to the device, the resulting analogue 

output, .a*. » is % - Vs/j + vj z Xa2.-v + vJjX / ^- + vJ rv x An +W 0 

Wj ■+ VvJo 



4 s ' 



where w 0 and are the threshold and 
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set of weights respectively. To be a competent pattern recognizer 
Adaline must be trained, In some way, to make match , 

where L; Is the required or desired output for the A th 
pattern. After training, which entails adaption of the weights, 
it is hoped that when presented with a pattern, Adaline 
will make the correct classification b,y generating an output 
as nearly as possible equal to cLj 0 

if is defined as the analogue error, , 

any training scheme which will make all the small is worth 

considering. The adaption scheme studied here attempts to 
minimize the sum of the squares of the errors for all the 
patterns presented, l.e. the scheme minimi ze s ° This 
topic is discussed fully in Chapter II. Other adaption schemes 

J 

are discussed by Treado • For each pattern, fx]. , the 

analogue error, &X. » I s measured. An equal adjustment is 
then made to each weight. This adjustment is proportional 

i 

to the error, , and is of such a sign as will reduce 

Qa. and Z& • Thus the change to the weight, Wj , at each step 
/l' I 

is given by : - AVOi^ = 3 lo1 

and for the threshold, VJ Q > It is:- ^ *-U 

where j is a constant and where it will be assumed that the 
weights can be varied continuously. The total adjustment 



after all patterns have been presented once is:- 

m 



and 



AWj = 

A\rto = 



1.2 



The adjustments of equation 1.1 are made if Q- is not equal 



4 



to 



d; 



Adjustments could, therefore, continue until the 

le! 



analogue error, , Is zero. Hence the minimum of 

found. If training Is carried on for long enough, will be 
zero. (See also Chapter II) 

Three aspects of equation 1.2 will be examined in a 
later section: 

1 . The speed with which the error, , is reduced as a 

succession of training patterns Is presented and the 
dependence of this speed of convergence on 0 . 

2 . The fact that the error, , In response to one 

VVN 

of the m patterns, may be too large even though 

A 1 ' 

Is acceptably small, but not zero. 

3. The fact that the constant, 0 , can take on values 

which make diverge. If, the constant, Q , takes 

z.-\ ° 

the value, , where ft fs the number of weights and 

A Is a constant, Mays £- 7 ^ showed that A^ for 

convergence. 

1.2 Separability of Patterns 

Adallne can be considered to be the realization of a 
linear decision function. Input patterns can be regarded 
as sets of points in H space and a linear decision function 

is any partitioning of the space by a hyperplane of dimension 

\ 

n — l . The pattern recognition problem requires the selection 
of a set of weights which will define an appropriate hyperplane. 
In an adaption scheme which uses the mean square error adaption 
rules the weights defining a separating hyperplane are those 
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which cause 2L<2^ and the analogue errors, , for each 

pattern, • » to be zero. Hence weights must be chosen to 
satisfy the following equations: 

X| ( W, + + +X,j\A|j t X m W„ + W 0 = d, 



x 21 w, t 

I 

I 

I 

I 

I 

Kii^i + 

\ 

I 

1 

I 

I 

\ 



XaX*VJ 0 = d,. 

I 

t 

\ 

i 

+ X *j VJj* ■X in vi n +W 0 -d; 

I 

1 

I 







+ X 



+ X m ,W^W 0 = d 



tn 



1.3 



The iterative training Scheme of section 1.1, if carried to 
its conclusion, will yield a set of weights defined by the 

i 

above equations if it is possible to map the patterns, 

[XI, » (XL » to outputs of. d, , - 

4^ » • If It Is possible to map all the patterns, with which 

Adaline is trained, to the desired outputs then the patterns 
are said to be separable. After training has been completed 
with a set of separable patterns, the values found for the 

threshold and the weights are fixed at W ^ » \AJ 

and the equation of the hyperplane dividing H space is, 
therefore, 



x,wf + 

Wh 6 T 6 9 Xj > "" 



+ + -+^00 = 0 

- X^are the coordinate axes of |f\ 



1.4 



space. 
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S' ~t> 



If the th pattern, £><]. » with coordinates, 4, -Xi- — X*,, 
m n space, is now presented, it should yield an analogue 
output, 

(X^ — XxM + X/(v\V\)t +Wt 1.5 

which is approximately equal to the desired output, . 

Equation 1.5 is the equation of a hyperplane on one side of 
the dividing hyperplane given by equation 1.4. Hence another 
definition of separability is that a set of patterns can be 
said to be separable if one group of them lie on one or more 

parallel hyperplanes on one side of, and parallel to, the 

dividing hyperplane and the other group lie on one or more 

parallel hyperplanes on the other side of, and parallel to, 

the dividing hyperplane. If a further restriction is placed 
on the minimum mean square error adaption scheme, the definition 
of separability is even simpler. Consider an example with YY1 
patterns (as described in equation 1 . 3 ) where Jk. of them 
have desired outputs of + <d and X of them have desired outputs 
of - d (and ^ +X =m .) If the patterns are separable, then 

the threshold and weights, W 0 , \A/ ( , , , can be found. 

Further, after training, and on presenting the Xi. patterns 
to these weights, a hyperplane on one side of the dividing 
hyperplane is defined. If the Xj patterns are now presented 
a hyperplane on the other side of the dividing hyperplane 
is defined, feoth hyperplanes are parallel to the dividing 
hyperplane and equal distances from it. These ideas can 
be clarified if specific examples in 2-space are considered. 

In 2-space Adaline consists of two weights, Wj and , and 
a threshold, ty*/© • ^- n ^'* ie example illustrated in Figure 4 there 
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are two patterns: 

Mi i;:', 

Lx]* *™--l (L »-el 

Xn_ = 1 



From equation 1.3 the equations defining the weights required 
for separation are: 




Wl + W x + Wo - d 
"W| + Wx +Wo “ 

These can be solved only by choosing a value for one of the 
weights and solving for the other two. The equations are 
consistent^ but yield an infinite number of dividing lines 
which, pass the point, 0 „ 




^The rank of the coefficient matrix is equal to the rank 
of the augmented matrix. 
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The equations defining the weights required for separation 
are: 

W, t W z + Wo = d 

W, - W z + w 0 = 

-W, i-w-L +w 0 --d 



d unique values of \a/o * * ^ 2 . are 

/ f . A 



These can he solved and 
found to be H f *-d ,Vtf*d , w£-d . Hence the equation of 
the dividing line is: 

x z + x, - / =0 

The two parallel hyperplanes for ..each class of patterns are 
indicated in Plgurf 5* 

Two inseparable cases are now considered. In the case 
which is illustrated in Figure 6 there are four patterns: 

[x], ' 



dt = d 

Xn. « I 



a d *" a 



H 



3 v 

*32 - 



a-l 



x 4 , = - | 

X 4 X - ~ 1 



[X ] 4 ^ ^ 



A 3 = -4 



Ax - 




The four equations to be solved for the weights are: 
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W, 4 W z t 4, W 0 - cl 

W | " 4 ia/q — ~ c( 

~ W, 4 W 2 -4 - ~~d 

-w j -w z +W 0 = -c/ 

and. they are Inconsistent^. A dividing line cannot be placed 
between them and satisfy the given conditions., 

Another inseparable case is shown in Figure 7° There 



are four patterns: 

- I 

- I 



60, d > =d 



Cx], 

Lx], 



X^i - "l 

Xm. “ ~ ^ 

= I 
=“l 



cW- d 

a, -4. 



[xl 4 X 4 | ' “I d 4 =-cl 

X*' I 





fXa 




X 


-1 


o 


“/ 




1 x! 


© 


w 


X 



FxJ l ,[xJ 1 _ 

* Lxl^Cxl. 



4 



Figure 7 

The resulting four equations are inconsistent and cannot 
be solved for the weights. It is obvious, from Figure 7» 
tjiat one line cannot separate the patterns* 

It should be emphasised that the examples discussed use 
minimum mean square error adaption. Since this scheme requires 



2 

The rank of the coefficient matrix is different from 
the rank of the augmented matrix* 
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convergence to a desired value of output (with the accompanying 
precise positioning of the hyperplane) it makes complete 

i * 

separation of apparently separable patterns (such as shown 
in Figure 6) impossible. Other "lese precise" schemes, as 
discussed by Treado 0 ] , do not encounter some of these 
difficulties. 

The problem of separability will be mentioned again in 
Chapter III where coding is considered. 

1.3 Experimental Simulation of Adaline 

A computer program was written to simulate Adaline and to test 
its ability to separate pattern sets. Two pattern sets were 
applied t a group of five 'C's was to produce an output of 
+10 and a group of five *T's to produce an output of -10„ 

The patterns and resultant inputs to Adaline are shown in 
Figure 8. During the training phase one of the patterns was 
presented and an appropriate adjustment made to the weights. 

Each of the ten patterns was then presented and the error, , 
in each case was measured. was , then calculated. This 

process was carried out 100 times. The value of the constant, 

3 , in equation 1.2 was initially chosed to be 0 . 05 . 

x 

A graph of Z.&. against number of adaptions is shown 

A H 

in Figure This has been called a "learning curve" by 
Widrow [ 5 ] . it indicates how many presentations of the 
patterns are necessary before Adaline is able to recognize 
all of the patterns with small error. From the learning 

v 

curve shown it can be seen that the quantity, , has 

/■*! 
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XXX X 



Patterns mapped to +10 



X X . 

9 • • 

0 0 9 

X X . 

( 1 ) 



.XXX 

. X . . 

. X . . 

.XXX 



X X X X 
X . . X 
X . . X 



( 2 ) ( 3 ) 



9 



XXX 
. . X 

. . X 

XXX 



( 4 ) 



( 1 ) 111 - 11 - 1 - 1 - 11 - 1 - 1 - 1111-1 
( 2 ) -1 1 1 1 -1 1 -1 -1 -1 1 -1 -1 -1 1 1 1 
( 3 ) lllll-l-lll-l-ll-l-l-l-l 
( 4 ) -1 1 1 1 -1 -1 -1 1 -1 -1 -1 1 -1 1 1 1 
(.5 ) -1 -l - 1 - 11 - 1 - 111 - 1-111111 



Patterns mapped to -10 



XXX. . . . . 

. X . . . X . . 

. X . . . X . . 

. . . . XXX. 

(6) . (7) 



.XXX 
. . X . 
. . X . 

( 8 ) 



0 0 9 0 

. . . X 

.XXX 
. . . X 

( 9 ) 



( 6 ) 111-1-11-1-1-11-1-1-1 -1 -1 -1 
( 7 ) -1 -1 -1 -1 -1 1-1-1-11-1-1111 -1 

( 8 ) -1 -1 -1 -1 -1 1 1 1 -1 -1 1 -l -l -l l -l 

( 9 ) -1 -1 -1 -1 - 1 - 1 - 11-1111 -1 -1 -1 1 

( 10 ) -1 -1 -1 - 11 - 1 - 1 - 1111-11 -1 -1 -1 



Figure 8 



X . . X 
X . . X 
X X X X 

( 5 ) 



• 9 O 9 

X . . . 

XXX. 
X . . . 

( 10 ) 



1 2 



dropped rapidly, after 20 adaptions, .from the large initial 
value of 1000 to a value of 4?. This might seem acceptable 

/ '£ “L 

but the use of £2., as a criterion is dangerous since it may 
be generated almost entirely by one or two unacceptably large 
errors. Training can be considered complete only when all 
the errors are zero or, if this requires too many adaptions, 
when the error for each pattern lies within an acceptable 
limit. 

The five *C's and five'T's were used again as test 
patterns to find the effect of the constant,^ , on the rate 



adaption. Values of , after various numbers of Iterations, 



plotted against ^ are shown in Figure 10. The ’’best'* value 

of ft , that producing the smallest , after a fixed number 

of iterations, is found to be ^ 0.06. If where is 

a constant and ft is the number of weights, the best value 
of is 1.02 for an Adeline with 16 weights. 

Learning curves for other values of ^ are shown in 
Figures 11 and 12. For small values of ft the adjustments 

are small and is reduced with few fluctuations. When 

ft approaches the maximum permissable value, the adjustments 
are large and experiences large fluctuations before a 



u 



suitably low value is approached. For values of ft greater 
than 0 . 116 ? the adjustments are too large and 26 diverges , 






The problem of large analogue errors rt concealed rt by what 
seems an acceptably low has been mentioned. In all 

cases examined, however, no individual error was ever 



prohibitively large. Figure 13 is a table of values of 



14 . 



14 ° 




Figure 10 




Learning Curve for 



Number of Iterations 




analogue error corresponding to a value of Z.Q. f° r various 

A .~< 

values of Q and various numbers of iterations. 



100 iterations 



3 




Analogue eaaor 


cLs -mo 


JZs -10 


o.o<) 


24.27 


ZQf\ 


1.25 


2.45 


-0,13 


-2.13 


0.17 


1 

OO 


-O.gO 


O.ll 


l.o l 


0.0b 


10.913 


|.3b 


l.'W 


1.31 


-1.18 


-0 sn 


-0.07 


-0.56 


0.43 


0.47 


0.03 


0.01 


4 3.S2 


0.58 


4-^ 


0 .S 0 


-2m 


1.33 


4.74 


1.01 


0.03 


-2.02 


-0,13 



SO iTEK^riONS 



3 


Ze^ 


Analogue ERfiofi. Q.J. 


CZ l -S +10 


t's - id 


0.0^ 


4S.24 


-21s 


"3.23 


-0.3 1 


2.25 


-0,56 


-1.12 


1.41 


3.47 


1.53 


-0 . 156 


0.06 


\7.40 


1.33 


3.53 


\.0B 


-mi 


-0.36 


0.34 


“0,12 


0.51 


o.ss 


0.037 


0.01 


76 .21 


|,60 


6X1 


1.41 


-US' 


2,14 


-3.48 


-0,0<1 


4.52 


-303 


-1,21 



Figure 13 

If training were continued indefinitely, the errors, and 

, could, presumably, be brought close to zero. This is 
hardly practical, however, and a more realistic approach is 
to limit the number of iterations and accent an analogue error.' 
Here the concept of a "dead zone" could be introduced. Thus, 
if the desired value of output is +10, a tolerance is placed 
on this value so that if, after training, a pattern produces 
an output of, say, +10 + 3 it would be classed in the +10 
group. In addition to making an excessive number of training 
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Iterations unnecessary this may enable Adaline to recognize 
patterns which it has not seen during training-such as a 
training pattern which has been contaminated by noise. If 
Adaline is used with, threshold device, two modifications 
could be made. Consider the Adaline shown in Figure 14. 




The threshold device yields an output of +1 if its input Is 
greater than +7 and an output of -1 if its input is more 
negative than -7« Only the lower half of the dead zone, or 
tolerance, is used here. An analogue input of 10 to the 
threshold element is aimed at but, in this case. If values 
of output, a , are greater than 10, adaption to reduce 
this to 10 would seem pointless. A new training scheme would 
be to adjust the weights (by the old rule) to make the output 
approach 10 from below. If the output is greater than 10 
no adaption should take place. 
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CHAPTER II 



A NOTE ON MINIMUM SQUARE ERROR ADAPTION 



There are many methods of approximating a polynomial by 
a straight line ofcurve. One common method is described by 
texts on numerical analysis C8] and receives the name 
"minimum square error approximation." If the curve shown in 
Figure 15 is a polynomial, pW, and is approximated by a 



straight line, £(x)*ax+b , then the constants, C\, and !o , 

can be found, for which Hx) satisfies the minimum mean square 

w' 2 , L 1 ‘L. 2, 

error criterion, i.e. The sum, * 0, + , 

i I 

where 0^ - p (X>i) “* , must be minimized. If p= 

then the values of CL and !c> which minimize pCcan be found 
using the techniques of differential calculus. 



2. 

% 
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At any point during the training phase, when the output 

of Adaline in response to these patterns is CL, » CLyv\» 

the errors are , ^2. » • Adaption requires the 

adjustment of the threshold and weights in such a way as to 
make Q, , — ■ 0.^ approach ci, , d.^, - — - By carrying 

over the concepts of polynomial approximation the sum of the 
squares of these errors can be found and, by using this as 
a criterion, the weights can be found which minimize • 

2d Analytical Minimization of ^ 

Before training Adaline is assumed to have the threshold 
and weights of value, Wp , W, , , — - Wj,, — ~ o 

the W\ training patterns are now presented the outputs are 
given by the following equations: 



CL, “ X„W, + ^W^**** + + X m W n + Wo 

<X Z = f i-\Aj 0 

' ' ; I ' 

The expressions for the errors are: <2 — Ct, “ CL, 

e z -j clt-a-L 

tf i £yvi “ dvvt, w. 



The sum, 



, is formed and this must be minimized by 
'A m 



adjusting the threshold and the weights* If ^6^ is called 3 , 
then . x. / > >■ / \ 3- 

S — ( d, -a.,) *+ (CI2.-O.2) +••'•»•* • + 

Analytically, the values of the threshold and weights which 
yield the minimum value of $ can be found by taking the 
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partial derivatives of with respect to the threshold and 

weights and equating the derivatives to zero. 

i.e. 

iL = o 

^Wo 

£S =o 

i 

is. L o 

There are /’?"/“/ equations and /9^/ unknowns and, if these 

p p 

equations are consistent, they can be solved for K • K> 

, IA£ — the values of threshold and weights which produce 

minimum S • The equations are: 

(d, -o.j') + ( di “di.) + •* * +(cU " &0 + (cL = O 

( d|“^i)x u -r (d L + ** ; + (cli + , ; ,, ' v (dvK-awi)x N s o 

(^*1 + ““ j -V- 1 " 4 (cLw\“^vujXwjj = o 

I I t I l ( 

(d l -a,)Xin + (di-ai)x lw + - + (cU-^)Xivv +, " + (d fc -^)Xv K - c 

They can be expressed compactly, with the ^ th equation given 

by! ^ • 0 or JU*X^“° ralld f° r W n 



and 



^ s /w=o 



VvA 

^VA/ 0 =W or ^ ~ ^ for ^ 

It is instructive to consider a few examples of this 



i - o 
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technique as applied to patterns In two space. For such patterns 

there will be three equations, with three unknowns, to be 

used to find the best weights,, l.e„ 

w 

2^ = o 
\ 



= o 



A - ^ l 
W\ 



=0 



In the example illustrated In Figure 16 the patterns are: 

d 



[X], X|.= ' d a 
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The errors are given by: 

£| = cL — w ( -t-w 1 
e 2 = -d + w, - w* 

and the three equations are: 

Wo = O 

w, - Wt = d 

w, - w t = d 




- Wo 
-Wo 



These equations yield an infinite number of solutions for 

the weights so that the weights define an Infinite number 

f 

of hyperplanes which pass through the point X ( “Oj Xf0 o ifW 2 is 
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J f 

chosen to be Q , then Wj = 24 and the hyperplane has 
equation, X*. - *2X|. This is the same result as was obtained 
In example 1 In section 1.2. 

t 

In the example shown In Figure 17 the patterns are: 



[x], V 1 d,=d 

X tt = 1 

0\ 1 d t -d 

X 

Wi ^3I S_I d r-d 




The errors are given by: 

e, = d - w, - w x - Wo 

e t =-c\ - w t + w L - w 0 

= ~d + Vxi, - Wj, -Wo 



and the .equations to be solved for the weights and threshold 

are! w, + + 3 Wo = - cl 

3w, - Wz + w„ = d 

w, — + Wo = - d 

Solution yields* ^ = -d , W, f = cl , W* = cl and 

hence, the equation of the dividing hyperplane isX 2 = l”Xj. 
Again this agrees with the results of example 2 in 1.2. 
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In the example shown in Figure 18 the patterns are: 




The equations for the errors ares 

0 1 — cl VJi “ W-2 ”M<3 

e t = -d - w, +w/i -Wo 

e 3 = -d + w, - w,. -w 0 

= — cl +W| +Wj-W(i 

and the three equations to he solved for the weights are: 

4 W 0 + 2d 
4W, -2d =0 

4- V-l ~2cl - O f 

and so W 0 = -d/^ , \h} { = , Vv/ 2 = CL/^ . The equation 

of the hyperplane which separates the two classes of patterns 
is X 2 + I 0 This apparent contradiction of the 

result of example 3 of section 1.2 can he explained in the 
following way. Minimizing S analytically results in finding 
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the particular hyperplane which will separate the patterns 

J? *■ 

into two classes in such a way as to minimize ° The 

X. — I 

equations of section 1,2 describe an adaptive scheme which will 
separate patterns with a precisely located hyperplane only if the 

vn 

error, &X * for each pattern and ^2^ can be reduced to zero. 

si r * 

Finally, in the example shown in Figure 19 the patterns are.: 



M, 



'•« * 

Xu - I 



12 X 2 2 = -| 



X32 - l 




n I 

X41 —I 

Figure 19 

r .f r 

The three equation for the weights yield w, T = w z - W„ T = o . 
This means, in effect, that no hyperplane exists which will 
separate the patterns. This agrees with the result of example 4 
in section 1.2. 

vn ^ 

2.2 Steepest Descent Minimization of le; 

VO a A.= > 

If the function, S -£<£, has a minimum it can be found 



by an iterative procedure. One possible method is described here, 
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If the threshold and weights have initial values, \J\Iq , \a/| > 

^2. * » and these weights do not define the minimum, 

then all the patterns, W, ’ Ml* — LKlx* — DQvn* 



are 



presented. The function, 5 » and the n+l partial derivatives 
are calculated. The method of steepest descent involves 
evaluating the magnitude of the gradient vector at the point 
defined by W 0 ' W;,— and descending a short 

distance along the gradient vector towards the minimum of 
by adjusting the weights, W 0 , W, , — W h . The adjustment 
made to the ^ th weight is proportional tooS^^jand is 
The adjustment to the threshold is • q is a 

• constant of proportionality and affects the size of the increments 
to all the weights and the threshold. It should be noted 
that the Increments, AW 0 , AW,, — AWj, — AW k , are not 
necessarily equal. At each new set of values, w„ , w, , — 

W* , — W v the patterns are again presented and the process 
is repeated. This process is continued until the minimum 
is found. A memory is required to Implement this process. 

After presenting each pattern the error, , and the ft elements 
of the A * th pattern must be stored until all the patterns 
presented so that S ,AW„ and M; can be calculated. 



are 



The calculation o fM 0 .Mj could be accomplished using 
h+l totaling registers to collect 

It is for this reason that a modified form of steepest descent 
(which does not require a memory) is used when training Adaline 
to find the values of weights which yield a minimum for S . 



27 . 



2.3 Modified. Steepest Descent 

The sum of the Squares of the errors, >3 » ts given by: 

S = 

The adaptive scheme described in this section attempts to 



find the weights which yield the minimum for 



5 



without the 



need for a memory. The first pattern is presented and 

is calculated. This is treated as the function to be minimized 

and the steepest descent method is applied to this. Calling 

-2. , . \Z 



"^ = =(d-a,), the derivatives are: 

1 /i N 



'V\ 

It should be noted that the derivatives are equal in magnitude. 

As was mentioned in section 2.2, the method of steepest descent 
Involves making an adjustment, which is proportional to %!j. 
to each of the weights, i.e. AVs/j. = and the adjustment 

to the threshold is £W.- ^e, , where ^ Is a constant. 

The adjustments, &\A/ 0 , AW, , — hW n , are equal in magnitude. 

These adjustments are made and then the second pattern is 
applied and is calculated. Equal adjustments are 

then made by applying one step of the steeoest descent proceedure 

patterns 

have all been been presented. If the minimum has not been 
found presentation of the patterns and adjustment continues as 
described. The "real" steepest descent method uses the function, S* 



to . This process is repeated until the 
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as the criterion. The criterion for the modified method, 

however, changes from ‘fj' to "Q. to to*f^. If the 

constant, ^ , is such that the adjustments are very small 



then 



S & f + £ + — -ft. . 



where 



$ -€ 



and is evaluated 



^1 ' '2 Mnft » - ■/. 

after,/. -1 adjustments-^. The minimum of 3 can be found 
by continuing to apply this method. In many cases the true 
minimum (as obtained by the analytical method of examples 
1 and 2 of section 2.1) is found but in some cases (as in 
example 3 of section 2.1) no hyperplane defining the 
minimum is found. The hyperplane oscillates around a mean 
plane which would define the minimum of 3 . 

The adaption scheme can be defined as follows: 

* 

Present the patterns in turn and, after each pattern 
is presented, adjust all of the weights and the 
threshold by an equal amount in a direction such 
that the error will be reduced. Adjustment is to 

. 'i 

take place of the error, on application of a pattern, 
is not zero. 

This scheme is most frequently called "Minimum Square Error 
Adaption" in the literature. The main difference between 
"Minimum Square Error Adaption" and the method of steepest descent 
is that in the former adjustment is made after each pattern is 
presented whereas in the latter, all the patterns are presented 
before any adjustment can be made. 



^ in "real" steepest descent S would be given by 

where , were evaluated before any 

adjustment was made. 



29 



) 



CHAPTER III 



AD ALINE IN CONTROL OF A PLANT 



3« 1 The Equations of the Plant 

In this chapter Adaline is taught to recognize the 
behaviour of a stable plant and to produce the correct control 
signal o The plant chosen is a second order system which 
consists of a motor and a load. The power supplied to the 
motor is controlled by a relay. The system is stable by 
virtue of unity and velocity feedback. It is shown in Figure 19 . 




d(s) = r(s) - K t s cl s) _ C ( S ) 

Figure 19 

The differential equations of the plant can be obtained, 
as a function of time, from the block diagram® Hence the 
output, C.(^, and the output rate, c("t) , can be calculated, 
as functions of time, for various initial values of c(t) 
and C(t) — rather than supplying the plant with various desired 
outputs , r(t). The voltage which is applied to the plant 
is .+ V volt » depending on the sign of the relay input, Ait). 

From Figure 19 » 

d(t) * rtt) - - ICt c(t) 3.1 
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» 



the switching condition for 



and choosing to set Kt) - o 
the voltage, V , is given by: 

c(t) = - _i_c(t) 

K* 

The equations for output and output rate are: 



-lot 



3.2 

3-3 

3-K 



c(t) = Kt + (c„+c,-k) + (K-c,)e 
c(t) = (C - b(K e ^ 

where C G = C(o) the Initial value of KK at b = O , 

C, = C(0) the initial value of C-l t'j at "t = O , 

and K = ± V • 

A convenient way of presenting this information is to use 
the phase plane. If the system error, e(t), is defined as 

, where Kt) is the desired output 
and C(t) the actual output, then a plot of error, 
versus error rate,£*(t^, is called a trajectory in the phase 
plane. Since "inputs" are applied to the system as initial 
conditions of Kt) , c(t) with Kt) = 0 then e(t) = - c(t) , 
e(t) = -c(t) . Hence equations 3«3 and J.b define a 
trajectory in the phase plane which describes the system 
behaviour for a given set of initial conditions. Since can 
be positive or negative, depending on the sign of the input 
to the relay, d(t) , the trajectory obeys two equations, A and 
B. 'this % s indicated in Figure 20. The switching condition 
has been stated in the equation 3»2. This is a line with 
a slope, * in the P hase plane. The values chosen for 

the control system were: “ 0.2 , K = +10 , and 

b = i. 
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Using the analytical solutions for c(t) and a computer 

program was constructed to produce sample trajectories In 
the phase plane for comparison with the attempts produced 
by Adaltne. One of them is shown in Figure 21. 

The objective of training Adaline is to enable it to 
recognize combinations of and in order that it 

may produce a response, a(t) , which Is close to the correct 
input to the relay, d(t) o During training it is desirable 
to present many combinations of ett) and , so that when 
Adaline later acts as a controller it will have enough 
Experience” (or have seen most of the phase plane) to 

make a correct decision and produce a good value of a(t) 
to drive the relay. To train Adaline in a practical 
situation various initial conditions, Uo) and c(o) , would 
be set and the resulting trajectories would generate values 
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of^(t) , £.(t) and d(t) , which would be presented to 
Adaline. This Is illustrated In Figure 22. It is unnecessary 




to follow this procedure exactly in a digital computer 
simulation,. Since it is known that the input to relay, d.(t) , 
is given by d(£). e(t)+Kte(t) everywhere in the phase 



plane, Adaline is, in this study, presented with a large 
number of randomly selected points with coordinates , £ 
and , and is trained to produce an analogue output 



of value as close as possible to d(t) for these points „ 

3.2 Coding of the Phase Plane and the Training of Adaline 
A practical Adaline is limited in size by having a 
finite number of weights and the number of weights limits 
the number of patterns which Adaline can attempt to separate 
When the control system responds to a set of initial conditions, 



, <x(t) , 



■}k 



the error, e(t), and error rate, e(t) , yary continuously 
until a steady state condition is reached,. If the values 
of ett) , e(t) and m are sampled frequently, patterns 
could be assigned to the points with coordinates, e(t) 
and ett) , in the phase plane. This would yield a very 
large number of patterns and an Adaline with a large number 
of weights would be required. Although, in this study, the 
values of and e(t) were sampled frequently, the above 

situation was avoided by dividing the phase plane into a 
small number of regions to which certain patterns were 
assigned. This ensures that only a small number of patterns 

will be presented to Adaline. It was also necessary to 

* 

choose the patterns, or codes, in such a way that it will be 
possible for Adaline to separate groups of the patterns into 
the appropriate two classes. If this last requirement is met 
it is said that the patterns are linearly separable. 

A linearly separable code which has been used by Widrow 
is indicated in Figure 23. The variable, X , which can take 



! 








A !b 

• • 


c 


d 


f 



X Col>E 

a^x<b ooo i 

b $ X < c oo 10 

X < cl o t oo 

X <f i o oo 



X 



Figure 23 
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on an infinite number of values between Q and. "p 
is coded by choosing a reference or origin at Q* on the X axis 
and dividing the region to the right of this reference into 
segmentso The total number of segments determines the number 
of digits in the code* The code assigned to each region is 
obtained by moving the digit 1 one space to the left as the 
variable, X , moves from one segment to another* This ensures 
that there is a difference of two digits in the code between 
any two regions--which may be a factor in explaining the 
separability of the code (Treado ) * 

The phase plane can now be divided into squares, for 
example, and a pattern assigned to each square* One method of 
coding ©(.t'j and £>(£) separately has been discussed* If this 
were done, a group of points in a square in the phase plane 
could have a coded value for , and a coded value for ett) . 
A pattern which could be associated with points lying within 
a given square in the phase plane can be obtained by placing the 
coded values for fi(t^and side by side. This means that 

if O has a value such that it is In segment 0001 , and ett) 
has a value such that it is in segment 0100, the pattern for 
the point with coordinates, tit) ,e(t) in the given square 



is 



[eel 



or 00010100, 



In all the tests described here the possible initial 
conditions for the control system were limited so that the 
maximum absolute value of <t) and was 10* This section 
of the phase plane was then divided into squares* By choosing 
a reference at = -\0 , d.(£) = ~lO , and were 
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Suitably coded and hence a pattern was assigned to each 
square. A division of the phase plane into 25 squares is 
shown in Figure 24. The fineness with which the phase plane 

* L 
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~/o\ 

Figure 24 

is divided determines the number of digits per pattern and 
this determines the number of weights in the Adaline. 

If the ett) and axes are divided into H sections 

each, then the pattern for each square has 2n digits. It 
can be seen that the size of Adaline increases linearly with 
the fineness of the division. The more alarming feature is 
that the number of patterns increases with the square of 
the sections, W , and the question arises as to whether 
Adaline is capable of separating H patterns into distinct 
classes when the number of weights is 2n o 

By using various sizes of grids on the e(t), e(t) 
plane, several Adalines were trained (according to the Mean 
Square Error scheme of Chapter I) to "duplicate" the performance 
of the control system of section As in Chapter I, each 

of the zeros in the patterns was replaced by the digit -1. 
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A pseudo -ramdom number generator was used to produce values 
of e(t) and ^(^1 which were then coded „ The function, d(-t) , 

was calculated from equation and, together with the 
coded values of and e(t) , presented to Adaline* 

After each pattern was presented the weights were adjusted 
and the next pattern read in* After a suitable training 
period the weights were fixed and Adaline was tested with 
one pattern from each square of the grid* 

The output of the trained Adaline, » is positive or 

negative — depending on which pattern is presentedo If a line 

i 

is drawn in the phase plane separating squares for which alt) 



<Ut) 



is positive from those with negative OUt) , this can be 
called the switching line which Adaline has produced, and 
the training of Adaline can be considered as the training of 
a function generator* 

Several points were examined using the scheme which 
has been outlined: 

1. The effect of the adjustment constant,^ , on 
the switching line which is obtained when the 
training process is complete* 

2. The effect of the number of training cycles on 
the final switching line* 

3* Convergence of values of weights and threshold 
as training proceeds * 

* 

4<> The capacity of Adaline* 

These four points were examined using a digital computer 

4 

simulation of the coding scheme and Adaline as described 
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in the previous paragraph. A training cycle consists of 



fixing the adjustment constant, , and presenting 
Adaline with patterns generated by a fixed number of 
points in the phase plane. The training cycle is repeated 
for various numbers of points and various 0 

Assuming that Adaline is trained with a very large 
number of randomly distributed,; points in the phase plane 
the switching line which Adaline will produce, when trained, 
can be predicted. 

For a grid of 16 squares, and hence an Adaline of 
eight weights and a threshold, Adaline* s attempt at 
duplicating a switching line of slope, ~5» is shown 



in Figure 25. For points occurring in all the squares, 



Actual Line fov" 
the Co nt veiled 
Plant. . 




Predicted Line . 



Figure 25 

except those numbered 2, 6, 11 and 15, the input to the 
relay, d(t), is either positive for all points in a square 
or negative for all points in the square. There is no 
difficulty in training Adaline to differentiate between 
such squares. Points in squares 2, 6, 11 and 15 can, 
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however, present Adaline with conflicting information 
during the training phase. In square 2, for example, all 
points, when coded, yield the same pattern, 00101000. Since 
the switching line passes through this square the points, and 
hence the pattern, can correspond to both positive and 
negative values of d(t). Adaline has, therefore, the problem 
of placing the same pattern into two different classes. Using 
the assumption of a statistically uniform distribution of 
points it can be seen that Adaline will see more values of 
negative act) than positive alt) during training. 

After training, therefore, the weights and threshold 
will have values such that the pattern of square 2 is 
classed negative. The same argument applies to squares 6, 11 
and 15. It is on this basis that a prediction of Adaline' s 
attempt at producing a switching line is made. 

For a grid of 25 squares, and an Adaline with ten 
weights and a threshold, the prediction of the switching 




Predicted 
L\v^e . 



In this case the weights and threshold, after training, 
should be such that squares 3 and 8 will yield positive 
output and squares 18 and 23 negative output ° The output 
of Adaline in response to a pattern from square 13 Is uncertain. 
Of the patterns presented to Adaline from square 13» statistic- 
ally, half of them would correspond to positive values of d(t') 
and half of them to negative values of d(t). 

For a grid of 3 6 squares and an Adaline of 12 weights 
and one threshold, the prediction of the switching line 




From previous arguments it can be seen that, for squares 9 
and 1 % Adaline Will produce a negative output after training 
and, for squares 22 and 28 Adaline will produce a positive 

t 

output. Since the actual switching line bisects squares 3 

and 34 the output Adaline will produce, after training, when 
presented with the patterns of squares 3 and 3^ is uncertain. 
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Examples of the switching lines actually produced by 
Adallne, after training, are shown in Figure 28o 




Variations of these switching lines were produced by using 
different values of the adjustment constant, C| , and with 
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different numbers of training cycles . The four points 
mentioned earlier in the section are now discussedo 



The size of adjustment can be considered as affecting 
the position of the switching line in the following way. 

For values of ^ close to or less than the optimum value 
(as discussed in section 1.3) the switching line produced, 
after training on a large number of points, is very close 
to the predicted line. Variations occurred when the values 
of. 0 were greater than the optimum value. When the 
adjustment is large, the last few patterns presented before 
training is stopped will have a disproportionately large 
effect on the weights. The last few patterns may not be 
distributed evenly among the squares and hence Adaline may 

be biased and produce a peculiar switching line. Two examples 
are shown in Figure 29. When the size of adjustment is 
small this effect is less. 
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The effect on the switching line of the number of points 

presented during training is similar to that due to different 

r 

values of constant, , but is not so pronounced . The 
squares which are affected most are those for which, during 
training, there are both positive and pegative values of d(t). 
The total number of points presented determines the number 
which occur in a given square. Although the random number 
generator produces a statistically random spread of points, 
it may be cut off at a stage when there has been a distinct 
bias toward one region of the phase plane. The number of 
points presented during testing varied between 100 and 1000. 

In the case of a grid of 16 squares the switching line is 
unaffected. In the grid of 25 squares, square 13 of Figure 2 6 
yields outputs both negative and positive-depending on the 
number of training points. In the case of a grid of j6 
squares the outputs produced by squares 3 and 3 ^ oscillate 
between positive and negative values. 

When examined during training the weights and threshold 
are seen to oscillate about a mean value. Given a value of 
less than or equal to the optimum value, the weights and 
threshold oscillate around values which yield a switching 
line close to the predicted switching line. The oscillation 
is a result of the presentation of conflicting information 
from squares cut by the switching line of the plant. 

Widrow has stated that "the statistical capacity 

of Adaline is twice the number of weights." By this 'rule' 
the Adaline with eight weights would be able to classify the 
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16 patterns which it receives but the 10- and 12-weight 
Adalines would have difficulty In attempting to classify the 
25 and 36 patterns which they receive . The results obtained 
indicate that this does not appear to be the case. In an 
attempt to discover if there is a limit to the number of 
patterns which Adaline can separate, Adalines with 14, 16, 

18, 20, 22 and 24 weights were trained,, Several thousand 
points were presented to each Adaline during training and, 
in all cases, the switching line produced after training 
was very close to that predicted. Variations occurred only 
when squares were cut by the switching line of the control 
system. Since an upper limit did not appear (an Adaline with 
24 weights separated 144 patterns into two classes) further 
study of Adaline' s capacity is indicated. 

3.3 Adaline Acting as the Controller 

A digital computer simulated Adaline was now placed in 
control of the simulated plant described in section 3»1* 

The desired output, )) , was set to zero, and 'inputs' were 
applied by setting initial conditions, <°) and c(o), at 
the output shaft. Trajectories in the phase plane were 
obtained for various initial conditions. 

A block diagram showing Adaline connected to the plant 
is shown in Figure JO* The weights and threshold chosen 
for these tests were those which gave switching lines as 
close as possible to the predicted lines of Figures 25 s 
26, 27. 
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The essence of computer calculations Is as follow?. 

The initial conditions C[oJ , are presented to the 

error, error-rate calculator and yield e(o) , e{o) . The 
values , e(o) are presented to the coder and time, t , is 

set to zero. A pattern, which depends on the square in which 
the point with coordinates, e(t) , e(t) , lies, is generated 
and applied to Adaline. Adaline produces an output, a , and 
this causes the output from the relay to assume a definite 
sign. Using equations 3*3 and 3*^» and a small increment of 
time (10ms), and C^fcjare calculated. These values define 

the second point of the phase trajectory. and 

are now applied to the error, error-rate calculator and the 
calculations are repeated. The values of and 

are stored for printing and for drawing a graph of C.(t^against 
e(t) — the phase trajectory. The input to the relay, , 

is examined after each increment of time. If it has changed 
sign, the trajectory has crossed the switching line and the 
sign of in equations 3*3 and 3*^ is then reversed. 

Time is set to zero and the initial conditions, C(o) , C (o) , 
assume the values of £-(t)and c(t) that existed immediately 
before switching occurred. Since the Increments of time are 
small, these values, of and Ut) are close to the values 

at the time of switching. The calculations described are 
continued until it is clear that the trajectory is either 
converging to zero error or to a steady state value of error. 

In one test, an Adaline with eight weights and a threshold 
was used to control the plant. The values of the weights and 
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threshold were those which yielded the switching line of 
Figure 28(a) and are given in Figure 31® This is the switching 

line of a relay operated plant, with no velocity feedback,, 

l 

The system in Figure 30 should exhibit the same response „ 

The trajectory in response to initial conditions, = ”8 » 

c(o)= -2 is shown in Figure 31® The response is highly 

oscillatory, but stable, and the error is, eventually, reduced 
to zero. A similar response would be obtained for other 
initial conditions. 

In considering an Adaline with ten weights and a threshold, 

.1 

corresponding to the switching line of Figure 28(b), it can 
be seen that the two vertical portions of the switching 
line will result in the relay switching early in the course of 
a trajectory, so that, after switching, the trajectory will 
tend towards the origin. The horizontal portion of the 
switching line, however, causes trouble. Consider a trajectory 
which crosses the switching line. If it is trajectory B of 
Figure 20, then the relay will change sign and the trajectory 
becomes trajectory A of Figure 20. Trajectory A then crosses 

i 

the switching line and switching again occurs. This chattering 
will continue until the trajectory crosses the vertical line. 

1 

This is best illustrated by Figure 32 for which the Initial 



is damped but settles to a steady state error of 2. Tests 
were then run with many values of initial conditions (see also 
Figure 33) but the main features of the trajectories were 
always as described. It should be noted that all trajectories, 



conditions 




The final solution 
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except one, will result in chattering and a final steady 
state error of 2. The exception is illustrated in Figure 34 where 
the initial conditions are c(o^ = -8 , c(o)=-2 . It can be 
seen that switching occurs early, and, after switching, the 
trajectory tends towards the origin. The trajectory continues 
past the e(t) axis, meets the horizontal switching line, 
chatters and, finally, settles to a steady state error of 2. 

There will, however, be a trajectory, close to that described, 
which would arrive at the origin after the relay has switched 
once . 

Tests were also made using an Adaline with 12 weights 
and a threshold, corresponding to the switching line of 
Figure 28(c). As in the example with 10 weights and a thresh- 
old, the two vertical portions of the switching line will 
cause early switching and the two horizontal portions are so 
placed that the relay will chatter until and ^ reach 

values such that the trajectory crosses the central, vertical 

\ 

line. The trajectory will, in all cases, settle to a steady 
value of zero error and error rate. This is illustrated in 
Figure 35 — for which the initial conditions are & (o) = 8 , 

C (6) - Z. » It can be seen that certain values of initial 
conditions would result in a trajectory switching only on the 
central portion of the switching line. This is illustrated in 
Figure 36--for which the initial conditions are Uo) = z , 

c( o') =S- 



48 



Finer division of the phase plane would result in 
more accurate reproduction of the desired switching line of the 
controlled plant, but Adaline's attempt would still consist 
of vertical and horizontal portions. The examples given 
show the type of response which can be expected. 
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Figure 33 
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Figure 35 




Figure 36 



CHAPTER IV 



CONCLUSIONS 



The work described in Chapter III has shown that it 
is possible for an Adaline to learn to control a simple 
plant. Its ability as a controller is poor, however, and 
in this section methods of improving the basic scheme are 
considered. The concept of training a netwprk to perform 
a desired function is also discussed. 

4.1 Adaline and a Dead Zone 

In all cases discussed in section 3-3 the combination 
of Adaline and the plant resulted in a stable control system, 
but Adaline' s control ability was poor in regions near the 

■* k 

origin of the phase , plane — with either lightly damped 
oscillations about the origin or lightly damped oscillations 
about a steady state error. One method of eliminating this 
would be to use Adaline in cpnjunction with a relay with a 
small dead zone. When the trajectory enters the central 
square, or squares, Adaline would be disconnected and, with 
the relay in the dead zone, the output shaft of the plant 
would coast. This method would probably yield a better 
response than can be obtained using Adaline with an 
ideal relay. At worst a limit cycle condition could persist 
but, if the central square were small, this could be tolerated. 
In the physical realization of this scheme, a source of 
difficulty would lie in deciding when the trajectory leaves 
and enters the central square so that Adaline could be 
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connected or disconnected. 

4.2 Coarse and Fine Division of the Phase Plane 

Another method of improving Saline's response near 
the origin could consist of dividing the central square, 

i » 

or squares, into the same number of squares as the major 
grid. The pattern codes of squares in the .-large grid could 
then also be assigned to corresponding squares in a fine 
grid. This is illustrated in Figure 37 for a division of the 
phase plane into 25 squares. If the weights and threshold 




correspond to the switching line of Figure 28(b), by- 
referring to section 30, it can be seen that most 
trajectories will now finish with a steady state error of 
2/5, instead of 2, since the grid is now five times finer 
near the origin. This scheme can be realized by altering 
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the coding box In Figure 30, When the trajectory enters 
the central square, the values of and e(t) are 

amplified by 5 before being applied to the coder. This is 
indicated in Figure 38 . As in the scheme of section 4,1, 




difficulties Would arise in deciding when the trajectory 
enters and leaves the central square. It should also be 
noted that this method is applicable only when Adeline is 
trained on a linear switching line. 

4.3 Polar Coding Scheme 

In Chapter III the controlled plant had a switching 
line which Adaline found difficult to reproduce due to the 
coarseness of the division of the phase plane. One possible 
approach to the division problem, in the case of this specific 
plant, would be to divide the phase plane into regions bounded 
by concentric circles, with their centre at the origin and 
their radii extending from the origin. The resulting 'curved' 

* i' 

rectangles are those to which patterns can be assigned 
using a coding scheme such as that discussed in section 3° 2, 
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This Is best illustrated by considering the example with 
six angular divisions of radians and six radial divisions 
in Figure 39. 

If a point with coordinates, , in the phase 

plane is to be coded and assigned a pattern, it is first 
necessary to convert the coordinates of the point into 
polar coordinates, |f“ , (jt) . This complication is trivial 
as far as a computer simulation is concerned, but if 
hardware implementation is to be considered, the additional 

i 

complexity would make this method of dividing the phase 
plane less attractive than rectilinear division. 

Using a digital computer simulation, and Adaline with 
12 weights and a threshold was trained on the controlled 
plant of section 3®1» The program was very similar to that 
described in section 3 ® 2® A pseudo-random number generator 
produced the coordinates, e(t) ,e(t) , of a point in the 
phase plane. They were then converted to polar coordinates, 



d(t), 



and presented to the coder. The resulting pattern and 
as calculated from equation 3®l s were presented to Adaline 

and adjustments were then made to the weights. The calculation 

* 1 , 

was then repeated for different points. After a few training 
cycles the weights were fixed and the response of Adaline, 
when presented with a pattern from each square, examined. 

The switching line produced, after training with 700 points 
and an adjustment constant of 0.03, is shown in Figure 40. 
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As has been discussed in section 3®2, the distribution of 
points affects the training--particularly the response from 
patterns corresponding to squares or areas cut by the switching 
line of the plant. The switching line produced, after 
training with 300 points and an adjustment constant of 0 „ 03 , 




It can be seen that an Adaline adequately trained in this 
manner would be able to control the plant , The results of a 
computer program, using the weights and threshold which 
correspond to the switching line of Figure 40 verified this* 

The phase trajectory was identical to that of a second order 
plant, compensated with unity and velocity feedback, and with 
the tachometer constant determined by the slope of the 
switching line in Figure 40, 

4,4 General Remarks 

The main purpose of this study has been to verify that 

* , 

a pattern recognition device, if suitably trained, can take 

the place of the controller of a planto Some of the possibilities 

of control using Adaline, and some of the associated problems 

i 

have been revealed in the course of this work. 

It is instructive to consider the peculiar nature of 
this particular recognition problem and to this end comparison 
is made with a simple form of the weather forecasting scheme 
discussed by Hu ['»]• Weather maps, containing information 
on barometric pressure over a wide area are the source of the 
patterns to be presented to Adaline, The weather (wet or dry) 
on the following day at location B is the "desired output,' 1 
The map is divided into squares and the squares generate +1 
or -1 pattern elements according as the pressure is higher 
or lower than normal. The pattern, consisting of the array 
of pattern elements and the resulting weather at B are presented 
to Adaline, I,f wet is assigned to -10 and dry to +10, Adaline 
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is trained to produce the appropriate output. Training consists 
of presenting Adaline with many weather mans, and the resulting 
conditions at B. After training is complete, Adaline can, 
given a weather map, estimate the weather on the following 
day c In this recognition problem, given a large amount of 
Information (over the area of the map), the response at a 
particular place can be evaluated. In the control system, 
the recognition problem is the following: given information 
on the error and error rate of the plant at an instant 
(the coordinates of a point in the phase plane) what is the 
input to the plant to force the error and the error rate to 
tend to zero? It is not possible in this case to examine the 
overall situation to assist in making the correct declsiono 
To date this problem has been approached by assigning patterns 
to regions of the phase plane, deciding into which region a 
point falls, and making a decision on receipt of this pattern.. 

It can be seen that a very small amount of information is 
presented to Adaline at each stage. A possible area of study 
would be £o search for a better way of presenting the information 
about the point, and possibly about the desired final point, 
to Adaline at each stage. For the control problem studied 
the specific difficulties encountered are now discussed. 

As has been mentioned, Adaline found it difficult to 
reproduce the linear switching line of the controlled plant 
because of the coarseness of division of the phase plane. 

It would, however, require a very fine division for Adaline 
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to be able to reproduce a line accurately. The resulting 



size of the Adaline would make practical realization, using 



even where only two states are coded. The polar coding scheme 
offers a partial solution to the problem of reproducing this 
particular switching line. There may also be cases in which 
the controlled plant has an unknown and complex switching 
line, which would be better tackled by a polar division 
than by rectilinear division. The problem of the coarseness 
of the division and the resulting undesirable features of 
the trajectory poses a difficult problem for which a solution 
is not immediately obvious. 

Directly associated with the problem of the coarseness 
of the divisions is the rapid increase of patterns, and the 
number of digits per pattern, with increasing fineness of 
the quantization. It was shown, experimentally, that 144 
patterns generated in the quantized phase plane, could be 
successfully separated using 24 weights and a threshold. 

The statement on the capacity of Adaline by Widrow [j^"3 and. 
discussed in section 3«2 deals, presumably, with all possible 
patterns which can be generated by permutations of the digits 
applied to Adaline. In presenting Adaline with 144 patterns 
from the phase plane, only a small and carefully controlled 
sample of the total number of patterns which can be obtained 
from the permutations of the 24 digits is used. 




adaptive elements, prohibitively expensive 



In this study Adaline was trained on a plant which was 
known to perform well with a conventional controller. It did, 
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in fact, prove possible to train Adaline to act as a (rather 
crude) controller, but if this is to remain more than an academic 
exercise, the question must be asked if this idea can be applied 
to more complicated situations. If Adaline is to be considered 
seriously as a controller in some given situation, it must 
be able to control competently, and it must show a definite 
advantage over conventional controllers. There is one such 
area where Adaline might be used — where the aim is to duplicate 
the response of a human operator, or, more generally, where 
the controller must learn to control in the face of an absence 
of information about the dynamics of the plant and its 
existing controller. In the case of the human operator the 
main problem would be to present Adaline with the correct 
signals. The operator can be considered as having a switching 
surface (or hypersurface) which depends on his past experience 
in acting on obsevations of many variables. It may prove 
possible to train Adaline to duplicate his performance so that 
Adaline could eventually take over the operator's task. A by- 
product of this process may be the identification of some of 
the characteristics of the controller by examining Adaline' s 
weights at the end of the training cycle. 

The problem of training Adaline to act as the controller 

of an unknown and, as yet, uncontrolled plant is more difficult. 

* 

A set of Weights could be placed in Adaline, and the phase 
space, with certain of the plant variables as its coordinate 
axes, could be quantized and coded. On starting the plant 
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from certain initial conditions, Adaline would produce a 
response and drive the plant. As the resulting trajectory 
passes through different coded regions, different responses 
would be produced which would force the plant to behave in 
a certain way. If the resulting trajectory is undesirable, 
Adaline must be adjusted in some way to produce a satisfactory 
trajectory. The main problem would be to decide how to devise 

i 

a suitable training procedure. 

Although the possible uses for Adaline seem plausible, 
they depend on solving the problem of Adaline’ s poor response 
in certain areas of the phase plane. The schemes mentioned in 
sections 4.1 and 4.2 for improving Adaline' s response near 
the origin do not attack the problem at its source, and an 
attempt should be made to improve the performance of Adaline 
itself. To date the Information on the state of the plant 
has been presented to Adaline as a pattern after some form 
of coding. An area of research would be to consider more 
sophisticated ways of presenting Adaline with information 
about the state of the plant and, Dossibly, presenting 
additional information about the desired final state. 
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